Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases

نویسندگان

  • Adriano Veloso
  • Wagner Meira
  • Srinivasan Parthasarathy
  • Márcio de Carvalho
چکیده

Mining distributed databases is emerging as a fundamental computational problem. A common approach for mining distributed databases is to move all of the data from each database to a central site and a single model is built. This approach is accurate, but too expensive in terms of time required. For this reason, several approaches were developed to efficiently mine distributed databases, but they still ignore a key issue privacy. Privacy is the right of individuals or organizations to keep their own information secret. Privacy concerns can prevent data movement data may be distributed among several custodians, none of which is allowed to transfer its data to another site. In this paper we present an efficient approach for mining frequent itemsets in distributed databases. Our approach is accurate and uses a privacy-preserving communication mechanism. The proposed approach is also efficient in terms of message passing overhead, requiring only one round of communication during the mining operation. We show that our privacy-preserving distributed approach has superior performance when compared to the application of a well-known mining algorithm in distributed databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Mining Frequent Itemsets in Distorted Databases with Granular Computing

Data perturbation is one popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing wor...

متن کامل

Privacy-Preserving Mining of Association Rules on Distributed Databases

Data mining techniques can extract hidden but useful information from large databases. Most efficient approaches for mining distributed databases suppose that all of the data at each site can be shared. However, source transaction databases usually include very sensitive information. In order to obtain an accurate mining result on distributed databases and to preserve the private data that is a...

متن کامل

Efficient Data Mining for Frequent Itemsets in Dynamic and Distributed Databases

Data Mining is one of the central activities associated with understanding and exploiting the world of digital data. It is the mechanized process of modeling large databases by means of discovering useful patterns. A frequent itemset is a pattern describing a relevant subset of the data, and a collection of frequent itemsets is particularly useful because it is an extremely compact model of the...

متن کامل

Privacy-preserving algorithms for distributed mining of frequent itemsets

Standard algorithms for association rule mining are based on identification of frequent itemsets. In this paper, we study how to maintain privacy in distributed mining of frequent itemsets. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. The existing solution for vertically partitio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003